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ABSTRACT . It  takes  of  the  order  of  N operations  to  solve  a set 
of  N linear  equations  in  N unknowns.  When  the  underlying  physical 
problem  has  some  time-  or  shift-invariance  properties,  the  coefficient 
matrix  is  of  Toeplitz  (or  difference  or  convolution)  type  and  the  equations 
can  be  solved  with  O(N^)  operations.  We  have  shown  that  with  any  non- 
singular  N X N matrix,  we  can  associate  an  Integer  a between  1 and 

N such  that  it  takes  O (N^OO  operations  to  invert  the  matrix.  The  number 
Of  may  be  small  for  many  non-Toeplitz  matrices  of  physical  interest.  Some 
aspects  of  this  result  are  discussed  here,  including  extensions  to 
continuous- time  kernels  and  integral  equations. 

1.  INTRODUCTION.  Problems  in  many  fields  lead  ultimately  to  the 
solution  of  linear  matrix  equations 

Ra  = m , 

where  R is  a given  N x N matrix,  say,  and  m is  a given  N x 1 vector. 

The  number  of  operations  required  to  solve  such  an  equation,  or  to  find 
-1  3 

R , is  of  the  order  of  N (multiplications  and  additions)  . This  can  be 
prohibitive  if  N is  large  (500  or  1000  or  3000,  as  can  arise  in  many 
power  system  or  econonometrlc  calculations).  For  this,  and  other  reasons, 
we  must  often  try  to  bring  in  any  special  features  or  structures  that  may 
be  present  in  the  original  physical  problem.  In  many  applications 

^This  report  is  a summary  of  a talk  given  at  the  22nd  Conference  of  Army 
Mathematicians,  Watervliet  Arsenal,  New  York,  May  1976.  It  was  based  on 
work  done  jointly  with  B.  Frledlander,  L.  Ljung  and  M.  Morf  (see  the 
references) . 

This  work  was  supported  by  the  Air  Force  Office  of  Scientific  Research, 
Air  Force  Systems  Command  under  Contract  AF44-620-74-C-0068 , and  in  part 
the  Joint  Services  Electronics  Program  under  Contract  X00014-75-C-0601 , 
and  the  National  Science  Foundation  under  Contract  NSF-Eng  75-18952. 

211 


we  have  the  property 


That  Is,  the  phenomena  are  Invariant  to  a change  in  the  time-  or  space- 

origin  (e.g.,  as  with  stationary  random  processes,  or  homogeneous  media, 

etc.).  In  this  case,  the  matrix  R is  said  to  be  a Toeplitz  matrix  and 

2 

ha^  the  nice  feature  that  its  inverse  can  be  found  with  only  0(N  ) multi- 
plications. Moreover  the  inverse  can  be  computed  recursively,  i.e.,  the 
N X N inverse  can  be  easily  updated  to  yield  the  (N  + 1)  x (N  + 1) 

Inverse,  and  Toeplitz  matrices  also  have  other  useful  properties. 

Unfortunately,  most  operations  on  Toeplitz  matrices  destroy  the  Toeplitz 
property.  For  example,  the  inverse  of  a Toeplitz  matrix  is  not  Toeplitz, 
unless  the  matrix  is  also  lower-  or  upper-triangular . So  also  the  product 
of  two  Toeplitz  matrices  is  not  Toeplitz,  unless  the  matrices  are  also 
both  lower-triangular  or  both  upper-triangular.  However,  some  reflection 
will  show  that  in  various  ways  one  can  regard  certain  matrices  as  being 

"less  non-Toeplitz"  than  others,  though  present  solution  methods  cannot 

2 

take  advantage  of  this — they  require  0(N  ) operations  in  the  Toeplitz 

3 

case,  and  0(N  ) otherwise. 

By  a long  process  of  abstraction  and  simplification  of  results  originally 

obtained  for  certain  nonlinear  differential  equations  [1  ],  [2  ],  we  have 

been  able  to  show  essentially  the  following  (more  precise  results  are 

stated  later) : with  any  invertible  N x N matrix  R we  can  associate 

2 

an  integer  Ct,  1 < a < N,  such  that  it  takes  0(N  0()  operations  to 
compute  its  inverse.  The  integer  OL  may  be  called  the  displacement  rank 
(or  index  of  nonstationarity)  of  the  matrix  and  has  the  property  that  it 
is  low  for  matrices  that  are  Toeplitz  or  near  to  Toeplitz,  while  it  is 
high  for  arbitrary  matrices.  For  example, 
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f 

1 


i) 

Q 

II 

for 

R 

= L or  U or  LU  or  UL,  where  L and  U 

denote 

lower- 

and  upper-triangular  Toeplitz  matrices. 

11) 

a = 2 

for 

R 

= (L  + U)  and  R = (L  + U)~^ 

111) 

a < 4 

for 

R 

= (L^  + + Ug) 

iv) 

a < 3 

for 

R 

= [L^  + ^ U^] 

V) 

a < n. 

if 

R 

is  the  covariance  matrix  of  a linear  combination 

of  the  components  of  any  n-vector  wide-sense  Markov  random 
process. 

2 3 

In  such  cases,  0(N  OO  can  often  be  significantly  less  than  0(N  ), 

thus  yielding  many  advantages,  not  Just  for  solving  a given  large  set  of 

equations,  but  also  for  interactive  adjustment  of  the  mathematical  model 

(l.e.,  of  R and  m)  based  on  actual  examination  of  the  now-more-easily 

determined  solution  a. 

We  shall  outline  our  major  results  in  Section  2,  for  matrix  equations. 

A similar,  and  somewhat  simpler,  development  can  be  carried  out  for 
Integral  equations,  as  noted  in  Section  3.  Section  4 contains  some  con- 
cluding remarks  on  possible  extensions  and  generalizations. 


2.  THE  MATRIX  CASE.  More  details  can  be  found  In  the  paper  [ 3], 
but  we  note  the  key  definitions  and  results  here. 

Definition  1.  The  (+) -displacement  rank  of  an  N x K matrix  R is  the 
smallest  Integer  a^(R)  such  that  we  can  write 


R = 


0(+(H) 

Z 

1 


Vi 


for  some  lower-triangular  Toeplitz  matrices  {L^^}  and  some  upper-triangular 
Toeplitz  matrices 

Definition  2.  The  (-) -displacement  rank  of  an  N x N matrix  R is  the 
smallest  integer  a (R)  such  that  we  can  write 


a (R) 


R = 


i^^i 


for  some  lower-triangular  Toeplitz  matrices  and  upper-triangular 

Toeplitz  matrices 
Definition  3.  Let 

Z = the  lower-shift  matrix 

0.  o 

c ' „ 

1 _ 

Lemma  1 . Computation  of  Displacement  Ranks 

a^<R)  = pU<R)} 

where 


1 


where 


! 


r(R)  = R - Z'RZ  . (2) 

The  proof  follows  by  using  the  result  of  Lemma  2. 

Lerama  2 . Given  two  column  vectors  x,y  there  is  one  and  only  one  solution 
of  the  functional  equation 

J(R)  = xy*  , (3a) 

and  this  is 

R = L(x)U(y')  , (3b) 

where  ' denotes  transpose,  L(x)  is  a lower- triangular  Toeplltz  matrix 
whose  first  column  is  x,  and  U(y')  is  an  upper-triangular  Toeplitz 
matrix  with  first  row  y' . 

Proof.  For  uniqueness^ note  that 

JCR^)  = J(R2) 

implies 

R,  - ZR,Z’  = R„  - ZR„Z' 

11  2 2 

or 

R^  - R^  = Z(R^  - R2)Z'  , 
whose  only  solution  is  clearly  zero. 

The  rest  amounts  to  verifying  that  jL(x)U(y')  = xy' , which  the 
reader  may  find  amusing  to  check  by  direct  computation  for  3x3  matrices,  fli 
Lemma  1 now  follows  easily  from  the  observation  that 

a a 

R = I L(x.)U(y')  ^ J(R)  = E X y!  (4) 

ill  1 ^ ‘ 

We  can  now  state  a first  simple^ but  apparently  new, result. 

Theorem  1 . 

C>_(R~^)  = :.:^(R)  . (5) 

Therefore, 


1 


a+(R) 

R = E L U (ga) 

1 1 
implies  that  R has  the  form 

_1  Ci^(R) 

R = Z 'U.C.  (6b) 

1 ^ ^ 

Proof.  We  give  the  simple  proof  (suggested  by  S-Y.  Kung)  because  it 
shows  that  the  result  is  quite  general  and  depends  very  little  on  the 
nature  of  the  entries  of  R — for  example,  they  could  themselves  be  matrices. 
We  note  that 

a_(R"S  = c(r"^  - Z'r'^Z) 

= o{(r”^  - Z'r'^Z)R) 

= c{I  - Z'r"^ZR) 

since  rank  is  unaffected  by  multiplication  by  a nonsingular  matrix.  Now 
by  a well-known  matrix  result  that 

p{I  - AB)  = p{I  - BA} 
we  can  continue  the  above  chain  as 

Ci£_(R~^)  = c{I  - zrz'r“^} 

= p{(i  - zrz’r"Sr} 

= c{R  - ZRZ* ) 

= 0:^(R)  . ■ 

Example.  If  T is  a symmetric  Toeplitz  matrix,  then  a (T)  = 2 = a (T) 
since  we  have  the  representations 


T = T •!  + I*T' 

+ + 

= I*T^  + T|*I  , 

where 

= the  lower-triangular  part 
of  the  matrix  T. 
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The  fact  that 


a^(T)  = 2 = a_(T) 


can  also  be  seen  by  checking  that 
J(T)  = T - ZTZ* 


. + t "1  f for  all  N > 2 

“-I  ^2  •••  h} 


o 


N 


and 


r(T)  = 


o 


N 


N-1 


N > 2 . 


_ N 2 1 

Now  it  turns  out  to  have  been  well-known  in  many  contexts  (see  the 
discussion  in  [4  ])  that  there  exist  two  lower-triangular  Toeplltz 
matrices  A and  B such  that 


-1 


= A’A  - B'B 


(7) 


so  that 


a_(T)  = 2 = a^(T)  . M 

Remark.  Notice  that  the  displacement  ranks  seem  to  identify  a better 
property  of  matrices  than  their  being  Toeplitz,  The  class  of  Toeplitz 
matrices  is  not  closed  under  inversion,  unlike  the  (+) -displacement  ranks 
and  the  corresponding  representations  (6  )• 

2 

Theorem  2 . The  inverse  of  an  N x N matrix  R can  be  found  with  0(N  CO 


can  be  done  via  certain  recursive  formulas  called  the  generalized  Szego 


Levinson  recursions. 

The  recursions  are  a bit  too  complicated  to  describe  here,  but  we 
may  note  that  for  Toeplitz  matrices  they  are  equivalent  to  the  well-knovm 
recursions  for  the  Szego  polynomials  orthogonal  on  the  unit  circle  (see, 
e.g.,  [5,  Ch.  11  ] or  [ 6])«  These  were  rediscovered  in  the  statistics 

literature  by  Levinson  [7  ] and  by  Durbin  [ g ] for  recursively  solving 
the  so-called  Yule-Walker  normal  equations  [ 9 ] . 

For  other  results  in  the  matrix  case,  we  refer  to  [3  ],  [lo]-[ll],  and 
instead  turn  briefly  here  to  an  examination  of  the  Integral  operator  case, 
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3.  INTEGRAL  EQUATIONS.  The  Fredholm  Integral  equation  of  the  second 


kind 


a(t)  + J"  K(t,s)a(s)  ds  = m(t)  , 0 < t < T 

0 


(8) 


has  been  extensively  studied,  see,  e.g,,  the  recent  monograph  [12]  . Except 
for  the  handful  of  cases  where  explicit  analytic  solution  Is  possible,  the 
generic  technique  Is  to  replace  the  Integral  equation  by  some  approximating 
set  of  N linear  equations 

Ra  = m . 

This  can  be  done  in  various  ways — use  of  degenerate  kernels,  projection 

(Galerkln  and  collocation)  methods,  etc.  For  example  In  the  degenerate 

kernel  method  we  replace  K(t,s)  by  the  function 

N ^ 

Kjj(t,s)  = £ 0^(t)it^^(s)  (9) 

for  some  suitably  chosen  functions  }•  In  any  case,  the 

3 

resulting  set  of  linear  equations  will  in  general  require  0(N  ) operations 
for  their  solution  and  this  may  be  prohibitively  large.  More  significant 
however  is  the  observation  that  such  approximation  methods  will  generally 
destroy  any  nice  structure  that  might  have  been  present  in  the  original 
problem. 

For  example,  if  the  kernel  was  of  Toeplitz  (also  called  displacement 
or  convolution)  type, 

K(t,s)  = K(t-s)  , say 

then  in  general 

Kjj(t,s)  y Toeplitz  for  N < ••  . 

This  is  bad,  because  the  Toeplitz  property  can  be  exploited  to  find  a 
nice  solution  of  the  integral  equation.  Briefly,  first  define 
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1 


H(t,s) 


the  Fredholm  resolvent  of  K(t,s) 


as  the  solution  of  the  integral  equation 

T 


t,r)K(r,s)  dr  = K(t,s)  , 0 < t,s  < T 


In  operator  notation,  we  can  write  this  as 


H + HK  = K 


(I  - H)  (I  + K)  = I 


which  shows  that  the  original  equation 


can  be  resolved  as 


(I  + K)a  = m 


a = (I  + K)  m = (I  - H)m 


a(t)  = m(t)  -J"  H^(t,s)m(s)  ds  . 

0 

Therefore  the  basic  problem  is  to  find  H(t,s).  Now  even  though  K(t  - s) 
may  be  Toeplltz,  this  will  not  in  general  be  true  of  its  resolvent  H^(t,s) 
(for  T < ••)  . Nevertheless  H^(t,s)  is  not  a completely  arbitrary  kernel, 
but  should  in  some  sense  be  close  to  a Toeplltz  kernel  (after  all,  its 
resolvent  is  Toeplltz) . 

We  can  quantify  this  intuitive  feeling  in  the  following  way  (the 
analog  of  the  method  used  in  Section  2) . Define  the  operator 


and  note  that 


jK(t,s)  = +|y(t,s)  . 


jK(t  - s)  s 0 . 


If  K(t,s)  is  not  Toeplitz  jK(t,s)  = 0,  but  it  will  be  some  function 
of  two  variables, which  we  can  write  as 
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a 


(11) 


jK(t,s)  = Z 0,  (t)>lf.(s) 

1 ^ ' 

for  some  functions  {0^(*)  ,'i^j^(*)  ) and  some  integer  a,  possibly  even 

infinite.  However  let  us  define  the  displacement  rank  of  K(t,s)  as 

the  smallest  integer  a(K)  such  that  the  representation  (11)  is  possible. 

Examples . i)  K is  Toeplitz,  a(K)  = 0. 

ii)  K(t,s)  = min(t,s),  the  covariance  of  the  simplest 
nonstationary  random  process,  the  Wiener  process. 

Clearly  JlK(t,s)  s 1 and  a = 1. 

ill)  K(t,s)  = ts  - min(t,s),  the  covariance  of  the 
so-called  Brownian  bridge  process.  Now 
jK(t,s)  = s + t - 1 and  a = 2.  B 

We  can  show  the  following  result,  analogous  to  Theorem  1 in  the  matrix 

case. 

Theorem  3.  a(Hp(t,s))  < a(K(t,s)  + 2. 

Example . When  K is  Toeplitz,  a(K)  = 0.  However  even  though  its 

resolvent  H^(t,s)  is  not  Toeplitz,  there  exist  two  functions  A^(*), 

B_(*)  such  that 
T 

JHj,(t,s)  = A,j.(t)A,j.(s)  - Bj,(t)B,j,(s),  (12)^ 

so  that 

a(H,j.(t,s))  = 2 . 

Moreover  the  functions  A,j,(*)  and  , of  one  variable,  can  be 

determined  more  easily  than  functions  of  two  variables.  In  fact  they 
can  be  obtained  via  the  differential  equations 


,j.(t)  = - B,j,(t)B^{T)  , 


0 < t < T 


(13a) 


B,j,(t)  = - .\j,(t)B,j.(t)  , 0 < t <T  (13b) 

with  certain  easily  determined  boundary  conditions  A^(0)  and  B,j,(T) . 


'This  is  the  analog  of  (7)  in  the  matrix  case, 
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The  point  is  that  these  differential  equations  can  be  solved  by  a simple 

2 

recursive  procedure,  which  needs  only  proportional  to  N operations, 
where  N is  the  number  of  poi-iis  in  [0,T]  used  in  any  discretization 
procedure. 

• • 

We  call  (13)  Krein-Szego- Levinson  equations  because  they  are  exactly 
the  recursions  found  by  Krein  [ 13]  for  the  continuous  analogs  of  the 

Szego  polynomials  on  the  unit  circle. 

Theorem  4.  If  K(t,s)  has  displacement  rank  a,  H,j(t,s)  can  be  found 

with  a times  as  much  computation  as  in  the  Toeplitz  case.  The  solution 

•• 

is  found  recursively  via  a set  of  generalized  Krein- Szego-Levinson 
equations. 

Proofs  and  further  results  can  be  found  in  the  papers  [ 14]-[ 15] . 

However,  we  might  draw  explicit  attention  to  the  fact  that  though  we 

are  using  a degenerate-kernel  representation  in  (11),  this  is  for  jK(t,s) 

and  not  for  K(t,s),  Even  though  jK(t,s)  is  degenerate  it  can  be  seen 

by  integration  that,  in  operator  notation, 

OH-l 

K = Z L.U. 

1 ^ ^ 

where  the  {L^)  and  {U^)  are  lower-  and  upper- Vol terra  operators. 
Therefore  K can  be  very  far  from  a degenerate  kernel.  The  feature 
of  our  method  is  that  it  preserves  any  "Toeplitz-like"  structure  that  may 
be  present  in  K(t,s).  This  thought  is  pursued  a bit  further  in  Section  4, 


4.  CONCLUDUtG  REMARKS.  We  have  taken  Toeplltz  kernels  as  basic 


because  they> or  things  close  to  them, arise  In  many  applications  of  Interest 
to  us.  However  In  other  problems,  other  "nice"  kernels  may  be  more  basic. 
For  example,  we  might  have  Hankel  kernels 

K(t,s)  = K(t  + s)  , say  . 

Integral  equations  with  such  kernels  can  be  solved  efficiently,  and  there- 
fore it  may  be  of  interest  to  classify  kernels  in  terms  of  their  degree 
of  "non-Hankelness" . This  can  clearly  be  done  as  above  by  using  the 
operator 

(It  ■ ’ 

which  gives  zero  when  applied  to  Hankel  kernels.  Similar  results  can 
also  be  obtained  for  basic  kernels  of  the  form  Kj^(t  - s)  + K2(t  + s). 

Furthermore  we  could  also  define  "second"  and  higher-order  operators 
of  the  type 

/(K(t,s>)  = 

and  so  on.  It  is  easy  to  find  examples  where  these  are  particularly 
appropriate. 

As  a final  comment,  we  should  express  our  feeling  that  the  basic 
ideas  described  above  should  be  adaptable  to  a variety  of  different 
situations.  Also  there  is  clearly  some  quite  general  algebraic  structure 
lurking  behind  our  results,  which  some  of  the  people  in  this  audience 
may  be  better  equipped  to  identify  than  we  can. 
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